{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# DAgger\n",
    "\n",
    "\n",
    "__DAgger (Dataset Aggregation)__ is one of the most popularly used imitation learning algorithms. Let's understand how DAgger works with an example. Let's suppose we want to train our agent to drive a car. First, we initialize an empty dataset . \n",
    "\n",
    "__In the first iteration__, we will start off with some policy $\\pi_1$ and drive the car. Thus, we generate a trajectory $\\tau$ using the policy $\\pi_1$. We know that the trajectory consists of a sequence of states and actions. That is, states visited by our policy $\\pi_1$ and actions made in those states using our policy $\\pi_1$. Now, we create a new dataset $\\mathcal{D}_1 $ by taking only the states visited by our policy $\\pi_1$ and we use an expert to provide the actions for those states. That is, we take all the states from the trajectory and ask the expert to provide actions for those states. \n",
    "\n",
    "\n",
    "\n",
    "Now, we combine the new dataset $\\mathcal{D}_1 $ with our initialized empty dataset $\\mathcal{D} $ and update $\\mathcal{D}$ as: \n",
    "\n",
    "$$ \\mathcal{D} = \\mathcal{D} \\cup  \\mathcal{D}_1 $$\n",
    "\n",
    "\n",
    "Next, we train a classifier on this updated dataset $\\mathcal{D}$ and learn a new policy $\\pi_2$. \n",
    "\n",
    "__In the second iteration__, we use the new policy $\\pi_2$ to generate trajectories and create a new dataset $\\mathcal{D}_2 $ by taking only the states visited by the new policy  $\\pi_2$ and ask the expert to provide the actions for those states.\n",
    "\n",
    "Now, we combine the dataset $\\mathcal{D}_2 $ with  $\\mathcal{D} $  and update $\\mathcal{D} $  as:\n",
    "\n",
    "$$\\mathcal{D} = \\mathcal{D} \\cup  \\mathcal{D}_2 $$\n",
    "\n",
    "\n",
    "Next, we train a classifier on this updated dataset $\\mathcal{D}$ and learn a new policy $\\pi_3$. \n",
    "\n",
    "\n",
    "__In the third iteration__, we use the new policy $\\pi_3$ to generate trajectories and create a new dataset $\\mathcal{D}_3 $  by taking only the states visited by the new policy $\\pi_3$ and then we ask the expert to provide the actions for those states.\n",
    "\n",
    "\n",
    "Now, we combine the dataset $\\mathcal{D}_3 $ with  $\\mathcal{D} $  and update $\\mathcal{D} $  as:\n",
    "\n",
    "\n",
    "$$\\mathcal{D} = \\mathcal{D} \\cup  \\mathcal{D}_3 $$\n",
    "\n",
    "Next, we train a classifier on this updated dataset $\\mathcal{D} $  and learn a new policy $\\pi_4$. In this way, DAgger works in a series of iterations until it finds the optimal policy. \n",
    "\n",
    "__Now that we have a basic understanding of DAgger, let's get into more details and learn how DAgger finds the optimal policy.__"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
